[feat][evaluation] experiment retry mode by lsy357 · Pull Request #434 · coze-dev/coze-loop

lsy357 · 2026-02-26T06:14:11Z

What type of PR is this?

feat

Check the PR title

This PR title match the format: [<type>][<scope>] <description>. For example: [fix][backend] flaky fix
The description of this PR title is user-oriented and clear enough for others to understand.
Add documentation if the current PR requires user awareness at the usage level.
This PR is written in English. PRs not in English will not be reviewed.

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

en:
zh(optional):

(Optional) Which issue(s) this PR fixes

codecov · 2026-02-27T03:20:09Z

Codecov Report

❌ Patch coverage is 74.22840% with 167 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...luation/domain/service/expt_run_item_event_impl.go	5.12%	35 Missing and 2 partials ⚠️
...ion/domain/service/expt_run_scheduler_mode_impl.go	93.59%	14 Missing and 7 partials ⚠️
backend/infra/lock/lock.go	64.15%	19 Missing ⚠️
...ation/domain/service/expt_manage_execution_impl.go	81.52%	8 Missing and 9 partials ⚠️
...aluation/infra/repo/experiment/expt_item_result.go	0.00%	16 Missing ⚠️
...d/modules/evaluation/application/experiment_app.go	53.12%	11 Missing and 4 partials ⚠️
...valuation/application/convertor/experiment/expt.go	31.57%	9 Missing and 4 partials ⚠️
backend/modules/evaluation/domain/entity/expt.go	60.00%	4 Missing and 4 partials ⚠️
...es/evaluation/domain/service/expt_run_item_impl.go	42.85%	7 Missing and 1 partial ⚠️
...s/evaluation/infra/repo/experiment/expt_run_log.go	55.55%	2 Missing and 2 partials ⚠️
... and 4 more

❌ Your patch status has failed because the patch coverage (74.22%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

@@            Coverage Diff             @@
##             main     #434      +/-   ##
==========================================
+ Coverage   72.88%   72.89%   +0.01%     
==========================================
  Files         624      624              
  Lines       62323    62897     +574     
==========================================
+ Hits        45421    45846     +425     
- Misses      13723    13846     +123     
- Partials     3179     3205      +26

Flag	Coverage Δ
unittests	`72.89% <74.22%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
backend/modules/evaluation/domain/entity/event.go	`63.63% <100.00%> (+3.63%)`	⬆️
...ckend/modules/evaluation/domain/entity/expt_run.go	`64.64% <ø> (ø)`
.../modules/evaluation/domain/entity/expt_template.go	`98.01% <ø> (ø)`
backend/modules/evaluation/domain/entity/param.go	`64.28% <ø> (ø)`
...ules/evaluation/domain/service/expt_manage_impl.go	`72.15% <100.00%> (+0.11%)`	⬆️
...aluation/domain/service/expt_run_item_turn_impl.go	`80.06% <50.00%> (ø)`
...on/domain/service/expt_run_scheduler_event_impl.go	`75.21% <75.00%> (+0.31%)`	⬆️
.../application/convertor/experiment/expt_template.go	`91.60% <62.50%> (-0.41%)`	⬇️
...nd/modules/evaluation/domain/entity/expt_result.go	`48.81% <25.00%> (-0.78%)`	⬇️
...s/evaluation/infra/repo/experiment/expt_run_log.go	`76.66% <55.55%> (-12.99%)`	⬇️
... and 9 more

... and 3 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34b145e...3b56e59. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

CozeLoop

Overall this PR introduces experiment-level retry modes (retry all / retry selected items) and the necessary plumbing across IDL, application, domain, scheduler, and infra layers. I focused on correctness, concurrency, error handling, and IDL integration.

Key findings:

Found two IDL annotation typos (api.boy instead of api.body) that will prevent item_retry_num from being bound at the HTTP/OpenAPI layer.
Identified a small error-handling issue where a nil error is wrapped, which can be misleading when debugging.
Suggested tightening invariants around run-log item IDs to avoid accidental duplicates in a single retry request.

The rest of the changes (locking primitives, scheduler modes, retry caps, repo/DAO extensions, and error codes) look conceptually sound and consistent with the existing evaluation pipeline. Please see inline comments for details and concrete fixes.

CozeLoop · 2026-02-27T08:10:13Z

    41: optional bool enable_weighted_score (api.body = 'enable_weighted_score', go.tag='json:"enable_weighted_score"')
    42: optional map<i64, double> evaluator_score_weights (api.body = 'evaluator_score_weights', go.tag='json:"evaluator_score_weights"')
    43: optional i64 expt_template_id (api.body='expt_template_id',api.js_conv='true', go.tag='json:"expt_template_id"')
+    45: optional i32 item_retry_num (api.boy = 'item_retry_num')


[Must Fix] Thrift annotation typo prevents HTTP binding for item_retry_num.

Problem: This field uses api.boy instead of api.body. Hertz/AGW generators only recognize api.body, so item_retry_num from the request body will not be bound into CreateExperimentRequest/SubmitExperimentRequest. The feature will appear to be ignored at the API layer.

Recommendation: Change api.boy = 'item_retry_num' to api.body = 'item_retry_num' for all item_retry_num fields in this Thrift file.

Reference: See Hertz Thrift annotations in your own IDL guidelines and CloudWeGo docs for api.body usage (similar to other fields in this struct).

CozeLoop · 2026-02-27T08:10:24Z

    20: optional i32 item_concur_num (api.body = 'item_concur_num')
    22: optional common.RuntimeParam target_runtime_param (api.body = 'target_runtime_param')

+    45: optional i32 item_retry_num (api.boy = 'item_retry_num')


[Must Fix] OpenAPI Thrift annotation typo for item_retry_num.

Problem: This field uses api.boy instead of api.body. As a result, the OpenAPI layer will not bind item_retry_num from the HTTP request body into SubmitExperimentOApiRequest, so clients cannot configure retry times via OpenAPI.

Recommendation: Replace api.boy = 'item_retry_num' with api.body = 'item_retry_num' here to align with other fields and the Hertz binding expectations.

Reference: CloudWeGo/Hertz Thrift annotations for HTTP binding (same pattern you use for item_concur_num and other body fields).

CozeLoop · 2026-02-27T08:10:40Z

+		rl, err = e.runLogRepo.Get(ctx, exptID, runID)
+		if err != nil {
+			return 0, false, err
+		}
+
+		if rl == nil {
+			return 0, false, errorx.Wrapf(err, "target runlog %v not found, expt_id: %v", runID, exptID)


[Should Fix] Wrapping a nil error here loses context and is misleading.

Problem: In this branch, err is guaranteed to be nil (you already returned on non-nil errors from runLogRepo.Get above). Calling errorx.Wrapf(err, ...) with a nil cause produces an error whose underlying cause is effectively empty and can confuse error inspection/log analysis.

Recommendation: Use a constructor that does not expect a non-nil cause, for example:

if rl == nil { return 0, false, errorx.New("target runlog %v not found, expt_id: %v", runID, exptID) }

or an equivalent errorx.NewByCode + extra message if you want to reuse a typed code.

Reference: Go Code Review Comments – avoid wrapping nil errors; keep error chains meaningful for errors.Is/As and log inspection.

CozeLoop · 2026-02-27T08:10:53Z

+func (e *ExptRunLog) AppendItemIDs(itemIDs []int64) error {
+	if e == nil {
+		return errorx.New("ExptRunLog AppendItemIDs must init first")
+	}
+	exists := make(map[int64]bool)
+	for _, chunk := range e.ItemIds {
+		for _, itemID := range chunk.ItemIDs {
+			exists[itemID] = true
+		}
+	}
+	rlItems := ExptRunLogItems{CreateAt: gptr.Of(time.Now().Unix())}
+	for _, itemID := range itemIDs {
+		if exists[itemID] {
+			return errorx.NewByCode(errno.EvalItemAlreadyRetryingCode, errorx.WithExtraMsg(fmt.Sprintf("existed item_id: %v", itemID)))
+		} else {
+			rlItems.ItemIDs = append(rlItems.ItemIDs, itemID)
+		}
+	}
+	e.ItemIds = append(e.ItemIds, rlItems)


[Suggestion] Guard against duplicate item IDs within the same retry request.

Problem: exists is built only from the already-persisted ItemIds chunks. If itemIDs passed into AppendItemIDs contains duplicates in the same call (e.g. [1, 1, 2] when ItemIds is currently empty), both entries will be appended because the exists map is never updated for the new IDs. This can lead to redundant scheduling work and makes the run log harder to reason about.

Recommendation: Track newly added IDs in the same exists map so the method enforces uniqueness across both historical and in-request items, for example:

for _, itemID := range itemIDs { if exists[itemID] { return errorx.NewByCode(errno.EvalItemAlreadyRetryingCode, errorx.WithExtraMsg(fmt.Sprintf("existed item_id: %v", itemID))) } exists[itemID] = true rlItems.ItemIDs = append(rlItems.ItemIDs, itemID) }

This keeps the invariant "each itemID appears at most once per run log" regardless of client input.

Reference: Go Code Review Comments – keep invariants explicit and avoid silent duplication of identifiers.

CozeLoop · 2026-02-27T08:11:10Z

 github.com/google/renameio v0.1.0/go.mod h1:KWCgfxg9yswjAJkECMjeO8J8rahYeXnNhOm40UhjYkI=
 github.com/google/s2a-go v0.1.8 h1:zZDs9gcbt9ZPLV0ndSyQk6Kacx2g/X+SKYovpnz3SMM=
 github.com/google/s2a-go v0.1.8/go.mod h1:6iNWHTpQ+nfNRN5E00MSdfDwVesa8hhS32PhPO8deJA=
+github.com/google/subcommands v1.2.0 h1:vWQspBTo2nEqTUFita5/KeEWlUL8kQObDFbub/EN9oE=


[No Risk] No issues found with this dependency change.

The added checksum for github.com/google/subcommands v1.2.0 is consistent with the existing go.mod entry and does not introduce any obvious risks for the backend. Version choice and scope of usage look reasonable for CLI-style helpers.

CozeLoop · 2026-02-27T08:23:11Z

[No Risk] DAO extensions for batch queries and raw run-log scanning look fine.

MGetItemResults/MGetItemTurnResults and the RawFilter branch in ScanItemRunLogs use appropriate query builders, respect the existing index (uk_expt_run_item_turn), and keep pagination semantics intact. Error wrapping is consistent with the rest of the DAO layer.

CozeLoop · 2026-02-27T08:23:27Z

[No Risk] New error codes for retry flows are well scoped.

EvalItemAlreadyRetrying and ExperimentIsCompleting are clearly named, have stable, non-overlapping codes, and their messages match how they are used in the domain layer (e.g., guarding duplicate retry items and runs in completing state). They are marked as not affecting stability, which aligns with their semantics.

CozeLoop · 2026-02-27T08:23:39Z

[No Risk] Domain IDL changes for item_retry_num are consistent with the model.

Introducing item_retry_num on Experiment and ExptFieldMapping fits naturally into the existing evaluation configuration and is wired through the convertors and templates. Field IDs avoid collisions and preserve backward compatibility.

CozeLoop · 2026-02-27T08:23:50Z

[No Risk] Extending ExptRunMode with explicit retry modes improves clarity.

Adding EvaluationModeUnknown, EvaluationModeRetryAll, and EvaluationModeRetryItems makes the run-mode enum more expressive and is consumed consistently across the scheduler and application layers. No issues spotted with numeric values or compatibility.

CozeLoop · 2026-02-27T08:24:19Z

[No Risk] Deleting this temporary file_name artifact is safe.

This looks like a stray CSV header/sample file that does not belong in the domain service package. Removing it reduces noise in the repo and has no impact on runtime behavior.

lsy357 added 18 commits February 6, 2026 14:11

feat(evaluation): retry extension

cc3666c

feat(evaluation): retry gen

e290ac1

feat(evaluation): retry extension

a28c726

Merge branch 'main' into feat/evalexpt

6c7050f

fix(evaluation): event mode

7975d8b

fix(evaluation): retryall

00e1b09

fix(evaluation): retry

08a4c53

Merge branch 'main' into feat/evalexpt

4d4dd2f

fix(evaluation): custom raw err

0172752

fix(evaluation): errmsg

aa2fe89

fix(evaluation): existed turn result

63b9578

fix(evaluation): scanitems

014bdb7

fix(evaluation): scanitems

cacd996

fix(evaluation): scancond

1842637

fix(evaluation): gorm sql

1123075

feat(evaluation): itemretry

f2fc240

fix(evaluation): ut

dfc48ea

fix(evaluation): ut

c12983a

lsy357 changed the title ~~[feat][evlauation] experiment retry mode~~ [feat][evaluation] experiment retry mode Feb 26, 2026

lsy357 added 11 commits February 26, 2026 14:38

fix(evaluation): experiment template itemretrynum

68b195f

fix(evaluation): runid

0e012df

fix(evaluation): appenditemids panic

0c61ec5

fix(evaluation): unlock

5e07bad

fix(evaluation): completing lock

c6c71eb

fix(evaluation): retryitems epxtend

ffff714

fix(evaluation): evaluator reenter

0e73f1f

fix(evaluation): expt event expiration

6ded8f8

Merge branch 'main' into feat/evalexpt

9b293f0

fix(evaluation): mockgen

7f838b4

fix(evaluation): ut

b1f2846

fix(evaluation): IgnoreExistedResult

0cfafd5

tpfz previously approved these changes Feb 27, 2026

View reviewed changes

fix(evaluation): itemretrynum

11fd029

lsy357 dismissed tpfz’s stale review via 11fd029 February 27, 2026 06:50

lsy357 added 2 commits February 27, 2026 14:58

Merge branch 'main' into feat/evalexpt

e3b1e83

fix(evaluation): ListEvaluationSetItems return

474e7e1

tpfz previously approved these changes Feb 27, 2026

View reviewed changes

fix(evaluation): ut

3b56e59

lsy357 dismissed tpfz’s stale review via 3b56e59 February 27, 2026 07:31

tpfz approved these changes Feb 27, 2026

View reviewed changes

dsf86 approved these changes Feb 27, 2026

View reviewed changes

lsy357 merged commit caae357 into main Feb 27, 2026
11 checks passed

lsy357 deleted the feat/evalexpt branch February 27, 2026 07:53

CozeLoop reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat][evaluation] experiment retry mode#434

[feat][evaluation] experiment retry mode#434
lsy357 merged 34 commits into
mainfrom
feat/evalexpt

lsy357 commented Feb 26, 2026

Uh oh!

codecov Bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

CozeLoop left a comment

Uh oh!

CozeLoop Feb 27, 2026

Uh oh!

CozeLoop Feb 27, 2026

Uh oh!

CozeLoop Feb 27, 2026

Uh oh!

CozeLoop Feb 27, 2026

Uh oh!

CozeLoop Feb 27, 2026

Uh oh!

CozeLoop Feb 27, 2026

Uh oh!

CozeLoop Feb 27, 2026

Uh oh!

CozeLoop Feb 27, 2026

Uh oh!

CozeLoop Feb 27, 2026

Uh oh!

CozeLoop Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lsy357 commented Feb 26, 2026

What type of PR is this?

Check the PR title

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

(Optional) Which issue(s) this PR fixes

Uh oh!

codecov Bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

CozeLoop left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented Feb 27, 2026 •

edited

Loading